首页> 外文OA文献 >Fast Support Vector Machines Using Parallel Adaptive Shrinking on Distributed Systems
【2h】

Fast Support Vector Machines Using Parallel Adaptive Shrinking on Distributed Systems

机译:基于并行自适应收缩的快速支持向量机   分布式系统

摘要

Support Vector Machines (SVM), a popular machine learning technique, has beenapplied to a wide range of domains such as science, finance, and socialnetworks for supervised learning. Whether it is identifying high-risk patientsby health-care professionals, or potential high-school students to enroll incollege by school districts, SVMs can play a major role for social good. Thispaper undertakes the challenge of designing a scalable parallel SVM trainingalgorithm for large scale systems, which includes commodity multi-coremachines, tightly connected supercomputers and cloud computing systems.Intuitive techniques for improving the time-space complexity including adaptiveelimination of samples for faster convergence and sparse format representationare proposed. Under sample elimination, several heuristics for {\em earliestpossible} to {\em lazy} elimination of non-contributing samples are proposed.In several cases, where an early sample elimination might result in a falsepositive, low overhead mechanisms for reconstruction of key data structures areproposed. The algorithm and heuristics are implemented and evaluated on variouspublicly available datasets. Empirical evaluation shows up to 26x speedimprovement on some datasets against the sequential baseline, when evaluated onmultiple compute nodes, and an improvement in execution time up to 30-60\% isreadily observed on a number of other datasets against our parallel baseline.
机译:支持向量机(SVM)是一种流行的机器学习技术,已被广泛应用于科学,金融和社交网络等领域,用于监督学习。无论是通过医护人员识别高危患者,还是潜在的高中生都按学区入学,支持向量机可以为社会福利发挥重要作用。本文面临的挑战是为大型系统设计可扩展的并行SVM训练算法,其中包括商用多核计算机,紧密连接的超级计算机和云计算系统。提高时空复杂度的直观技术,包括自适应消除样本以实现更快的收敛和稀疏格式提出了代表性。在消除样本的情况下,提出了从{\ em最早可能}到{\ em lazy}消除非贡献样本的几种启发式方法。在某些情况下,早期消除样本可能会导致错误的,低开销的关键数据重建机制提出了结构。该算法和启发式算法是在各种公开可用的数据集上实现和评估的。经验评估显示,在多个计算节点上进行评估时,某些数据集相对于顺序基准的速度提高了26倍,并且与我们的平行基线相比,许多其他数据集的执行时间也提高了30-60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号